Nanyang Technological University Model - Based Noise Robust Speech Recognition

نویسندگان

Nguyen Duc

Hoang Ha

Haizhou Li

چکیده

Noise robustness is a challenging problem when automatic speech recognition (ASR) system is deployed in real life applications. This report examines techniques to improve the robustness of ASR systems. Particularly, we focus on a group of model-based noise robust techniques, called vector Taylor series (VTS) method, that adapt the acoustic model of ASR systems towards noisy test data using the knowledge of noise corruption process. In this report, the VTS method is extended to efficiently handle non-stationary additive noise and convolutional noise cases. The first work in this report is about improving the VTS method to handle nonstationary additive noises. In the conventional VTS method, a single Gaussian is usually used to model noise, but it is insufficient to handle non-stationary noise case. Although using Gaussian mixture models can improve the modeling of noise, this will result in significant increase of model complexity and computational cost. To avoid these drawbacks, we propose to first use a modified spectral subtraction method to reduce the non-stationary characteristics of the additive noise in the speech, and then apply the VTS method using only a single Gaussian noise model. In the modified spectral subtraction method, the noise characteristics is normalized towards a single Gaussian noise model, hence this method is called noise normalization VTS (NN-VTS). In addition, the mismatch function and the acoustic model compensation of the VTS method are also modified to account for the remaining noise in the features. Initial study on the Aurora-2 task shows that NN-VTS can improve the performance of VTS method in non-stationary environments. The second work in this report is about extending the VTS method to handle cepstral mean normalization (CMN) processed features. CMN is an efficient way to reduce channel distortions in speech features. However, conventional VTS method is unable to work

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the performance of MFCC for Persian robust speech recognition

The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...

متن کامل

Intelligent Audio, Speech, and Music Processing Applications

1Digital Signal Processing Laboratory, School of Electrical and Electronic Engineering, Nanyang Technological University, Nanyang Avenue, Singapore 639798 2Department of Electrical Engineering, Northern Illinois University, Dekalb, IL 60115, USA 3Center for Robust Speech Systems (CRSS), Department of Electrical Engineering, Erik Jonsson School of Engineering and Computer Science, University of ...

متن کامل

روشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه

Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...

متن کامل

A hybrid refinement scheme for intra- and cross-corpora phonetic segmentation

A hybrid refinement scheme for intraand crosscorpora phonetic segmentation Sixuan Zhao a,∗, Ing Yann Soon a, Soo Ngee Koh a, Kang Kwong Luke b a School of Electrical & Electronic Engineering, Nanyang Technological University, 50, Nanyang Avenue, Singapore 639798, Singapore b School of Humanities & Social Sciences, Nanyang Technological University, 50, Nanyang Avenue, Singapore 639798, Singapore

متن کامل

Robustizing robust M-estimation using deterministic annealing

Robustizing Robust M-Estimation Using Deterministic Annealing S. Z. Li School of Electrical and Electronic Engineering Nanyang Technological University Singapore 639798 [email protected] ABSTRACT This paper presents a modi ed robust M-estimator referred to as annealing M-estimator (AM-estimator) to avoid problems with M-estimator. The AM-estimator combines the annealing technique into the...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Nanyang Technological University Model - Based Noise Robust Speech Recognition

نویسندگان

چکیده

منابع مشابه

Improving the performance of MFCC for Persian robust speech recognition

Intelligent Audio, Speech, and Music Processing Applications

روشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه

A hybrid refinement scheme for intra- and cross-corpora phonetic segmentation

Robustizing robust M-estimation using deterministic annealing

عنوان ژورنال:

اشتراک گذاری